nlp_architect.data.sequential_tagging.CONLL2000

class nlp_architect.data.sequential_tagging.CONLL2000(data_path, sentence_length=None, max_word_length=None, extract_chars=False, lowercase=True)[source]

CONLL 2000 POS/chunking task data set (numpy)

Parameters:
  • data_path (str) – directory containing CONLL2000 files
  • sentence_length (int, optional) – number of time steps to embed the data. None value will not truncate vectors
  • max_word_length (int, optional) – max word length in characters. None value will not truncate vectors
  • extract_chars (boolean, optional) – Yield Char RNN features.
  • lowercase (bool, optional) – lower case sentence words
__init__(data_path, sentence_length=None, max_word_length=None, extract_chars=False, lowercase=True)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(data_path[, sentence_length, …]) Initialize self.

Attributes

char_vocab character Vocabulary
chunk_vocab chunk label Vocabulary
dataset_files
pos_vocab pos label Vocabulary
test_set get the test set
train_set get the train set
word_vocab word Vocabulary
char_vocab

character Vocabulary

chunk_vocab

chunk label Vocabulary

dataset_files = {'test': 'test.txt', 'train': 'train.txt'}
pos_vocab

pos label Vocabulary

test_set

get the test set

train_set

get the train set

word_vocab

word Vocabulary